1)¶

Importing Libraries¶

In [1]:
import pandas as pd
import plotly.express as plx
from raceplotly.plots import barplot
from tqdm import tqdm
import warnings
from datetime import datetime, timedelta
import plotly.io as pio
import nltk
import json
import pickle
from nltk.corpus import stopwords
pio.renderers.default = "notebook"
warnings.filterwarnings("ignore")

Importing complaints.csv¶

In [2]:
comp_df = pd.read_csv('complaints.csv/complaints.csv',low_memory=False)
comp_df
Out[2]:
Date received Product Sub-product Issue Sub-issue Consumer complaint narrative Company public response Company State ZIP code Tags Consumer consent provided? Submitted via Date sent to company Company response to consumer Timely response? Consumer disputed? Complaint ID
0 2024-03-13 Credit card General-purpose credit card or charge card Advertising and marketing, including promotion... Didn't receive advertised or promotional terms I formally request the removal of an inaccurat... NaN EQUIFAX, INC. MO 631XX NaN Consent provided Web 2024-03-13 Closed with non-monetary relief Yes NaN 8538710
1 2024-03-14 Credit reporting or other personal consumer re... Credit reporting Incorrect information on your report Information belongs to someone else NaN NaN EQUIFAX, INC. PA 19142 NaN Consent not provided Web 2024-03-14 Closed with non-monetary relief Yes NaN 8551289
2 2023-10-25 Credit reporting or other personal consumer re... Credit reporting Improper use of your report Reporting company used your report improperly NaN Company has responded to the consumer and the ... TRANSUNION INTERMEDIATE HOLDINGS, INC. CA 92114 NaN Consent not provided Web 2023-10-25 Closed with explanation Yes NaN 7755832
3 2023-10-25 Mortgage FHA mortgage Struggling to pay mortgage An existing modification, forbearance plan, sh... NaN Company has responded to the consumer and the ... Specialized Loan Servicing Holdings LLC CA 956XX NaN Other Web 2023-10-25 Closed with explanation Yes NaN 7753166
4 2024-05-02 Money transfer, virtual currency, or money ser... Domestic (US) money transfer Money was not available when promised NaN NaN NaN Sigue Corp. TX 78644 NaN Other Web 2024-05-02 In progress Yes NaN 8916876
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5242995 2023-08-02 Credit reporting, credit repair services, or o... Credit reporting Incorrect information on your report Information belongs to someone else NaN NaN EQUIFAX, INC. TX 76036 NaN Other Web 2023-08-02 Closed with non-monetary relief Yes NaN 7337991
5242996 2023-08-02 Payday loan, title loan, or personal loan Installment loan Problem with additional add-on products or ser... NaN NaN NaN NBT BANCORP INC. NJ 08046 NaN NaN Phone 2023-08-02 Closed with explanation Yes NaN 7343156
5242997 2023-08-02 Mortgage FHA mortgage Struggling to pay mortgage NaN NaN Company believes it acted appropriately as aut... Chesmar Homes TX 77316 NaN Other Web 2023-08-17 Closed with explanation Yes NaN 7340709
5242998 2023-08-02 Credit reporting, credit repair services, or o... Credit reporting Improper use of your report Reporting company used your report improperly NaN NaN EQUIFAX, INC. GA 30013 NaN Consent not provided Web 2023-08-02 Closed with explanation Yes NaN 7341166
5242999 2023-08-04 Payday loan, title loan, or personal loan Payday loan Charged fees or interest you didn't expect NaN NaN NaN Flandreau Santee Sioux Tribe TX 75057 NaN Consent not provided Web 2023-08-18 Closed with explanation Yes NaN 7350142

5243000 rows × 18 columns

Looking at the information of DataFrame using info() in pandas package¶

In [3]:
comp_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5243000 entries, 0 to 5242999
Data columns (total 18 columns):
 #   Column                        Dtype 
---  ------                        ----- 
 0   Date received                 object
 1   Product                       object
 2   Sub-product                   object
 3   Issue                         object
 4   Sub-issue                     object
 5   Consumer complaint narrative  object
 6   Company public response       object
 7   Company                       object
 8   State                         object
 9   ZIP code                      object
 10  Tags                          object
 11  Consumer consent provided?    object
 12  Submitted via                 object
 13  Date sent to company          object
 14  Company response to consumer  object
 15  Timely response?              object
 16  Consumer disputed?            object
 17  Complaint ID                  int64 
dtypes: int64(1), object(17)
memory usage: 720.0+ MB

Total null values in the particular column¶

In [4]:
try:
    with open('pickle files/df_null_info.pkl', 'rb') as f:
        df_null_info = pickle.load(f)
except:
    col_name, total_null= [], []
    for x in comp_df.columns:
        col_name.append(x)
        total_null.append(sum(comp_df[x].isnull()))
    df_null_info = pd.DataFrame()
    df_null_info['column_name'], df_null_info['Total_null'] = col_name, total_null
    with open('pickle files/df_null_info.pkl', 'wb') as f:
        pickle.dump(df_null_info, f)

df_null_info
Out[4]:
column_name Total_null
0 Date received 0
1 Product 0
2 Sub-product 235292
3 Issue 3
4 Sub-issue 736164
5 Consumer complaint narrative 3406287
6 Company public response 2729863
7 Company 0
8 State 45780
9 ZIP code 30225
10 Tags 4757083
11 Consumer consent provided? 1041536
12 Submitted via 0
13 Date sent to company 0
14 Company response to consumer 15
15 Timely response? 0
16 Consumer disputed? 4474684
17 Complaint ID 0

The above dataframe shows columns that don't have a null. The rest of them have a huge null count. Here, mostly, the datatype of the column is string. So first, we're looking at the unique counts of each column.¶

1.1 Visualization of null count of each column in a dataset :¶

In [5]:
plx.bar(df_null_info, x = 'column_name', y='Total_null')

Insight :¶

  • Most of the values in the column 'Tags' are null.

  • Whereas, 'Date received', 'Product', 'Company', 'Submitted via', 'Date sent to company', 'Timely response?' and 'Complaint ID' has zero null values.

In [6]:
try:
    with open('pickle files/df_unique_count.pkl', 'rb') as f:
        df_unique_count = pickle.load(f)
except:
    col_name, total_unique = [], []
    for x in comp_df.columns:
        col_name.append(x)
        total_unique.append(comp_df[x].nunique())
    df_unique_count = pd.DataFrame()
    df_unique_count['column_name'], df_unique_count['Total_unique'] = col_name, total_unique
    with open('pickle files/df_unique_count.pkl', 'wb') as f:
        pickle.dump(df_unique_count, f)

df_unique_count
Out[6]:
column_name Total_unique
0 Date received 4550
1 Product 21
2 Sub-product 86
3 Issue 178
4 Sub-issue 272
5 Consumer complaint narrative 1492267
6 Company public response 11
7 Company 7220
8 State 63
9 ZIP code 33657
10 Tags 3
11 Consumer consent provided? 4
12 Submitted via 7
13 Date sent to company 4499
14 Company response to consumer 8
15 Timely response? 2
16 Consumer disputed? 2
17 Complaint ID 5243000

The above Dataframe shows the number of unique values in each column of the dataset.¶

Exploratory Data Analysis¶

1.2 Visualization for Number of complaints in each day(over time) :¶

In [7]:
fig = plx.bar(pd.DataFrame(comp_df['Date received'].value_counts()))
fig.update_traces(dict(marker_line_width=0))
fig.update_layout(
        title="No of Complaints in a day over time",
        xaxis_title="Time",
        yaxis_title="Total Complaints",
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

The number of complaints increased over time.¶

1.3 Visualization for Number of complaints for desired product : ¶

In [8]:
df_1 = pd.DataFrame(comp_df['Product'].value_counts(dropna=False))
fig = plx.pie(df_1, values=df_1['Product'], names=df_1.index)
fig.update_layout(
        title="Total number of complaints for the desired product",
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

  • "Credit reporting, credit repair services, or other personal consumer reports" contribute 41.3% of total complaints.

     
  • Whereas "virtual currency" contributes the least amount of 0.000343% of total complaints.

     

1.4 Visualization for Number of complaints for desired Sub-product : ¶

In [9]:
fig = plx.bar(pd.DataFrame(comp_df['Sub-product'].value_counts() ), orientation='h' )
fig.update_layout(
        title="Total number of complaints for the desired Sub-product",
        height = 700,
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

  • "Credit reporting" category has more than 3 million complaints out of 5 million complaints (total records).

     
  • Whereas the "Electronic Benifit Transfer / EBT card" category has the fewest complaints.

     

1.5 Visualization for Number of complaints for desired Issue : ¶

In [10]:
fig = plx.bar(pd.DataFrame(comp_df['Issue'].value_counts(dropna=False) ), orientation='h' )
fig.update_layout(
        title="Total number of complaints for the desired Issue",
        height = 750,
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

  • The "Incorrect information on your report" category has more than 1.5 million complaints.

  • Whereas "Lender sold the property" has the least number of complaints.

     

1.6 Visualization for Number of complaints for desired Sub-issue : ¶

In [11]:
fig = plx.bar(pd.DataFrame(comp_df['Sub-issue'].value_counts(dropna=False) ), orientation='h' )
fig.update_layout(
        title="Total number of complaints for the desired Sub-issue",
        font=dict(
            family="Courier New, monospace",
            size=12,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

  • The "Information belongs to someone else" category has more than 1 million complaints out of 5 million complaints (total records).

     
  • Whereas the "Problem with fraud alerts or security freezes" have the least number of complaints.

     

1.7 Visualization for Number of complaints for desired 'Company public response' : ¶

In [12]:
df_2 = pd.DataFrame(comp_df['Company public response'].value_counts(dropna=False))
fig = plx.pie(df_2, values=df_2['Company public response'], names=df_2.index)
fig.update_layout(
        title="Total number of complaints for the desired \"Company public response\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        font=dict(
            family="Courier New, monospace",
            size=11,
            color="RebeccaPurple"
        )
    )
fig.show()

Insight :¶

  • 52.1% of the dataset is null in 'Company public response' column.

  • 43.2% belong to the category "company has responded to the consumer and the CFPB and chooses not to provide a public response."

  • Rest of the categories has fewer contribution.

1.8 Visualization for Number of complaints for desired 'Company' : ¶

In [13]:
df_3 = pd.DataFrame(comp_df['Company'].value_counts() )
df_3 = df_3.reset_index()
df_3['Company'], df_3['count'] = df_3['index'], df_3['Company']
df_3.drop(['index'], 1, inplace=True)
df_3['contribution %'] = [((x/len(comp_df)) * 100) for x in df_3['count']]
df_3.head(5)
Out[13]:
Company count contribution %
0 EQUIFAX, INC. 1094863 20.882377
1 TRANSUNION INTERMEDIATE HOLDINGS, INC. 1016332 19.384551
2 Experian Information Solutions Inc. 925932 17.660347
3 BANK OF AMERICA, NATIONAL ASSOCIATION 140733 2.684208
4 WELLS FARGO & COMPANY 128879 2.458116

Insight :¶

The above dataframe shows that the huge amount of complaints(% in double digit) belongs to¶

  • "EQUIFAX, INC."(20.88%),

  • "TRANSUNION INTERMEDIATE HOLDINGS, INC. "(19.38%), and

  • "Experian Information Solutions Inc."(17.66%).

  • Rest of the companies contributes a few percentage in the total dataset.

1.9 Visualization for Number of complaints for 'State' : ¶

In [14]:
df_4 = pd.DataFrame(comp_df['State'].value_counts() )
df_4 = df_4.reset_index()
df_4['State'], df_4['count'] = df_4['index'], df_4['State']
df_4.drop(['index'], 1, inplace=True)
df_4
Out[14]:
State count
0 FL 629941
1 CA 597278
2 TX 560364
3 GA 360174
4 NY 336721
... ... ...
58 AA 59
59 MP 57
60 AS 50
61 MH 33
62 PW 13

63 rows × 2 columns

In [15]:
fig = plx.choropleth(df_4, locations='State',locationmode='USA-states', color='count',
                           color_continuous_scale="Viridis",
                           range_color=(0, max(df_4['count'])),
                           scope="north america",
                           labels={'Complaints': 'Complaints Count'}
                          )
fig.update_layout(
    title = "Total number of complaints for the \"State\" in USA " ,
    height = 700,
    margin = dict(t=0, b=0, l=0, r=0),
    font = dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    ) 
)
fig.show()

The above map shows total count of complaints for each state in North America.¶

NOTE 1 : Please move the mouse pointer to the shaded area to know the 'State' and 'Total Complaints'.¶

NOTE 2 : The states which has military base(AP, AA, etc.,) are not shown here, due to privacy issue.¶

NOTE 3 : The NAN count is excluded in map chart.¶

Insight :¶

  • Huge amount of complaints from the State "FL" (Florida), around 0.63 million.

  • Next to the 'FL', the state "CA" (California) has 0.6 million complaints.

  • Next to the 'CA', the state "TX" (Texas) has 0.56 million complaints.

1.10 Number of complaints for desired 'ZIP code' : ¶

In [16]:
df_5 = pd.DataFrame(comp_df['ZIP code'].value_counts() )
df_5 = df_5.reset_index()
df_5['ZIP code'], df_5['count'] = df_5['index'], df_5['ZIP code']
df_5['contribution %'] = [((x/len(comp_df)) * 100) for x in df_5['count']]
df_5.drop(['index'], 1, inplace=True)
df_5
Out[16]:
ZIP code count contribution %
0 XXXXX 120483 2.297978
1 30349 9723 0.185447
2 19143 8231 0.156990
3 33025 7345 0.140092
4 33311 6725 0.128266
... ... ... ...
33652 46200 1 0.000019
33653 87470 1 0.000019
33654 85062 1 0.000019
33655 98141 1 0.000019
33656 98933 1 0.000019

33657 rows × 3 columns

Insight :¶

  • There are more than 1.2 lakh zip code of "XXXXX". Company hides that particular ZIP code with 'X' mark.

1.10.1 Visualization for Number of complaints for desired 'ZIP code' which has 'X' mark: ¶

In [17]:
try:
    with open('pickle files/df_5_temp_1.pkl', 'rb') as f:
        df_5_temp_1 = pickle.load(f)
except:
    df_5_temp_1 = pd.DataFrame(comp_df)
    df_5_temp_1 = df_5_temp_1.where(df_5_temp_1['ZIP code'].str.contains('X'))
    df_5_temp_1.dropna(how='all',inplace=True)
    with open('pickle files/df_5_temp_1.pkl', 'wb') as f:
        pickle.dump(df_5_temp_1, f)
df_5_temp_1
Out[17]:
Date received Product Sub-product Issue Sub-issue Consumer complaint narrative Company public response Company State ZIP code Tags Consumer consent provided? Submitted via Date sent to company Company response to consumer Timely response? Consumer disputed? Complaint ID
0 2024-03-13 Credit card General-purpose credit card or charge card Advertising and marketing, including promotion... Didn't receive advertised or promotional terms I formally request the removal of an inaccurat... NaN EQUIFAX, INC. MO 631XX NaN Consent provided Web 2024-03-13 Closed with non-monetary relief Yes NaN 8538710.0
3 2023-10-25 Mortgage FHA mortgage Struggling to pay mortgage An existing modification, forbearance plan, sh... NaN Company has responded to the consumer and the ... Specialized Loan Servicing Holdings LLC CA 956XX NaN Other Web 2023-10-25 Closed with explanation Yes NaN 7753166.0
13 2024-01-19 Credit reporting or other personal consumer re... Credit reporting Improper use of your report Reporting company used your report improperly 15USC 1681 a ( d ) ( 2 ) ( A ) ( i ) clearly s... Company has responded to the consumer and the ... Experian Information Solutions Inc. MD 218XX NaN Consent provided Web 2024-01-19 Closed with explanation Yes NaN 8183526.0
15 2024-05-03 Credit reporting or other personal consumer re... Credit reporting Problem with a company's investigation into an... Their investigation did not fix an error on yo... NaN NaN Experian Information Solutions Inc. NY 117XX NaN NaN Web 2024-05-03 In progress Yes NaN 8913449.0
20 2023-09-15 Credit reporting or other personal consumer re... Credit reporting Incorrect information on your report Information belongs to someone else I am writing to formally dispute and request c... Company has responded to the consumer and the ... Experian Information Solutions Inc. MO 630XX NaN Consent provided Web 2023-09-15 Closed with non-monetary relief Yes NaN 7549549.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5242837 2021-10-16 Credit card or prepaid card General-purpose credit card or charge card Closing your account Company closed your account I sent Capital One written correspondent about... NaN CAPITAL ONE FINANCIAL CORPORATION TX 799XX NaN Consent provided Web 2021-10-16 Closed with explanation Yes NaN 4815687.0
5242838 2021-10-16 Money transfer, virtual currency, or money ser... Mobile or digital wallet Managing, opening, or closing your mobile wall... NaN NaN NaN Paypal Holdings, Inc NaN XXXXX NaN Consent not provided Web 2021-10-16 Closed with explanation Yes NaN 4814403.0
5242840 2021-10-16 Debt collection I do not know Written notification about debt Didn't receive notice of right to dispute NaN NaN The CBE Group, Inc. MN 563XX NaN Other Web 2021-10-16 Closed with explanation Yes NaN 4815034.0
5242844 2021-10-18 Debt collection Other debt Attempts to collect debt not owed Debt was result of identity theft NaN NaN Bread Financial Holdings, Inc. WV 253XX NaN Other Web 2021-10-18 Closed with explanation Yes NaN 4820567.0
5242891 2021-11-20 Mortgage Conventional home mortgage Struggling to pay mortgage NaN NaN NaN 21ST MORTGAGE CORP. AR 716XX NaN Other Web 2021-11-20 Closed with explanation Yes NaN 4932173.0

549537 rows × 18 columns

Insight :¶

  • Around 0.55 million complaints/records contains the 'X' mark in the "ZIP code" column.

In [18]:
try:
    with open('pickle files/df_5_temp_2.pkl', 'rb') as f:
        df_5_temp_2 = pickle.load(f)
except:
    df_5_temp_2 = pd.DataFrame(df_5_temp_1['State'].value_counts() )
    df_5_temp_2 = df_5_temp_2.reset_index()
    df_5_temp_2['State'], df_5_temp_2['hidden_count'] = df_5_temp_2['index'], df_5_temp_2['State']
    total_complaints = []
    for x in df_5_temp_2['State']:
        df_5_temp_3 = comp_df.where(comp_df['State'] == x)
        df_5_temp_3.dropna(how='all',inplace=True)
        total_complaints.append(len(df_5_temp_3))
    df_5_temp_2['Total Complaints'] = total_complaints
    df_5_temp_2['Hidden rate %'] = list((df_5_temp_2['hidden_count'] / df_5_temp_2['Total Complaints']) * 100)
    df_5_temp_2.drop(['index'], 1, inplace=True)
    with open('pickle files/df_5_temp_2.pkl', 'wb') as f:
        pickle.dump(df_5_temp_2, f)
        
df_5_temp_2
Out[18]:
State hidden_count Total Complaints Hidden rate %
0 FL 51888 629941 8.236962
1 TX 48719 560364 8.694170
2 CA 37114 597278 6.213857
3 NY 29544 336721 8.774030
4 PA 28872 248582 11.614678
... ... ... ... ...
57 AA 25 59 42.372881
58 MP 19 57 33.333333
59 FM 16 116 13.793103
60 AS 8 50 16.000000
61 MH 3 33 9.090909

62 rows × 4 columns

In [19]:
df_5_temp_2.sort_values('Hidden rate %', inplace=True, ascending=False)
df_5_temp_2.reset_index(inplace=True)
df_5_temp_2
Out[19]:
index State hidden_count Total Complaints Hidden rate %
0 56 GU 163 353 46.175637
1 54 AP 332 749 44.325768
2 51 AE 554 1298 42.681048
3 57 AA 25 59 42.372881
4 53 VI 361 987 36.575481
... ... ... ... ... ...
57 22 AZ 6861 93067 7.372108
58 41 DC 1660 22770 7.290294
59 5 GA 26241 360174 7.285645
60 2 CA 37114 597278 6.213857
61 31 NV 3897 77271 5.043289

62 rows × 5 columns

In [20]:
fig = plx.choropleth(df_5_temp_2, locations='State',locationmode='USA-states', color='Hidden rate %',
                           color_continuous_scale="Viridis",
                           range_color=(0, max(df_5_temp_2['Hidden rate %'])),
                           scope="north america",
                           labels={'Complaints': 'Hidden rate'}
                          )
fig.update_layout(
    title = "Total number of Hidden ZIP code for the \"State\" in North America " ,
    height = 700,
    margin = dict(t=0, b=0, l=0, r=0),
    font = dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    ) 
)
fig.update_traces(text = df_5_temp_2['State'])
fig.show()

The Above map shows the hidden rate of ZIP code for particular state in United States of America.¶

NOTE : The states which has military base(AP, AA, etc.,) are not shown here due to privacy issue.

Insight :¶

  • The state 'VT', 'ME', 'WV' having huge hidden rate in ZIP code.

  • Whereas, 'NV', 'CA' states has least amount of complaints which has hidden ZIP code.

1.11 Visualization for Number of complaints for desired 'Tags' : ¶

In [21]:
df_6 = pd.DataFrame(comp_df['Tags'].value_counts(dropna=False) )
df_6 = df_6.reset_index()
df_6['Tags'], df_6['count'] = df_6['index'], df_6['Tags']
df_6.drop(['index'], 1, inplace=True)
df_6
Out[21]:
Tags count
0 NaN 4757083
1 Servicemember 286553
2 Older American 159849
3 Older American, Servicemember 39515
In [22]:
fig = plx.pie(df_6, values='count', names='Tags')
fig.update_layout(
        title="Total number of complaints for the \"Tags\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        font=dict(
            family="Courier New, monospace",
            size=11,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_6['Tags'], textinfo = 'percent+text')
fig.show()

Insight :¶

  • There are 3 different tags in the dataset.

  • 90.7% of the tags were null.

  • "Servicemember" contributes 5.47% of the dataset.

  • "Older American" contributes 3.05%.

  • "Older American, Servicemember" holds 0.754%.

1.12 Visualization for Number of complaints for desired 'Consumer consent provided?' : ¶

In [23]:
df_7 = pd.DataFrame(comp_df['Consumer consent provided?'].value_counts(dropna=False) )
df_7 = df_7.reset_index()
df_7['Consumer consent provided?'], df_7['count'] = df_7['index'], df_7['Consumer consent provided?']
df_7.drop(['index'], 1, inplace=True)
df_7
Out[23]:
Consumer consent provided? count
0 Consent not provided 2099875
1 Consent provided 1838262
2 NaN 1041536
3 Other 254303
4 Consent withdrawn 9024

The above dataframe shows the total number of complaints for each category in the column "Consumer consent provided?".¶

In [24]:
fig = plx.pie(df_7, values='count', names='Consumer consent provided?')
fig.update_layout(
        title="Total number of complaints for the \"Consumer consent provided?\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        font=dict(
            family="Courier New, monospace",
            size=11,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_7['Consumer consent provided?'], textinfo = 'percent+text')
fig.show()

Insight :¶

  • Around 2 million (40.1%) complaints are consumer consent was not provided.

  • Around 1.8 million (35.1%) complaints are consumer consent was provided.

  • Around 1 million (19.9%) complaints are null.

  • Around 0.25 million (4.85%) complaints were recorded as "others".

  • The 9024 (0.172%) complaints were recorded in "Consent withdrawn" category.

1.13 Visualization for Number of complaints for desired 'Consumer consent provided?' : ¶

In [25]:
df_8 = pd.DataFrame(comp_df['Submitted via'].value_counts(dropna=False) )
df_8 = df_8.reset_index()
df_8['Submitted via'], df_8['count'] = df_8['index'], df_8['Submitted via']
df_8.drop(['index'], 1, inplace=True)
df_8
Out[25]:
Submitted via count
0 Web 4690912
1 Referral 248657
2 Phone 181242
3 Postal mail 94865
4 Fax 25658
5 Web Referral 1241
6 Email 425
In [26]:
fig = plx.pie(df_8, values='count', names='Submitted via')
fig.update_layout(
        title="Total number of complaints for the \"Submitted via\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        height = 800,
        width = 700,
        font=dict(
            family="Courier New, monospace",
            size=15,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_8['Submitted via'], textinfo = 'percent+text')
fig.show()

Insight¶

  • Most of the complaints submitted via 'Web', which is 89.5%.

  • Whereas, Email submission are too low, which is 0.008%.

1.14 Visualization for 'Date received' vs 'Date sent to company' : ¶

In [27]:
try:
    with open('pickle files/comp_df.pkl', 'rb') as f:
        comp_df = pickle.load(f)
except:
    days_taken_to_process_complaint = []
    for i in tqdm(range(0, len(comp_df))):  
        yr_1 = comp_df['Date sent to company'][i].split('-')[0]
        mo_1 = comp_df['Date sent to company'][i].split('-')[1]
        dt_1 = comp_df['Date sent to company'][i].split('-')[2]
        yr_2 = comp_df['Date received'][i].split('-')[0]
        mo_2 = comp_df['Date received'][i].split('-')[1]
        dt_2 = comp_df['Date received'][i].split('-')[2]
        date1 = datetime(int(yr_2), int(mo_2), int(dt_2))
        date2 = datetime(int(yr_1), int(mo_1), int(dt_1))
        difference = date2 - date1
        days_taken_to_process_complaint.append(difference.days)
    comp_df['days taken to forward complaint'] = days_taken_to_process_complaint
    with open('pickle files/comp_df.pkl', 'wb') as f:
        pickle.dump(comp_df, f)

comp_df
Out[27]:
Date received Product Sub-product Issue Sub-issue Consumer complaint narrative Company public response Company State ZIP code Tags Consumer consent provided? Submitted via Date sent to company Company response to consumer Timely response? Consumer disputed? Complaint ID days taken to forward complaint
0 2024-03-13 Credit card General-purpose credit card or charge card Advertising and marketing, including promotion... Didn't receive advertised or promotional terms I formally request the removal of an inaccurat... NaN EQUIFAX, INC. MO 631XX NaN Consent provided Web 2024-03-13 Closed with non-monetary relief Yes NaN 8538710 0
1 2024-03-14 Credit reporting or other personal consumer re... Credit reporting Incorrect information on your report Information belongs to someone else NaN NaN EQUIFAX, INC. PA 19142 NaN Consent not provided Web 2024-03-14 Closed with non-monetary relief Yes NaN 8551289 0
2 2023-10-25 Credit reporting or other personal consumer re... Credit reporting Improper use of your report Reporting company used your report improperly NaN Company has responded to the consumer and the ... TRANSUNION INTERMEDIATE HOLDINGS, INC. CA 92114 NaN Consent not provided Web 2023-10-25 Closed with explanation Yes NaN 7755832 0
3 2023-10-25 Mortgage FHA mortgage Struggling to pay mortgage An existing modification, forbearance plan, sh... NaN Company has responded to the consumer and the ... Specialized Loan Servicing Holdings LLC CA 956XX NaN Other Web 2023-10-25 Closed with explanation Yes NaN 7753166 0
4 2024-05-02 Money transfer, virtual currency, or money ser... Domestic (US) money transfer Money was not available when promised NaN NaN NaN Sigue Corp. TX 78644 NaN Other Web 2024-05-02 In progress Yes NaN 8916876 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5242995 2023-08-02 Credit reporting, credit repair services, or o... Credit reporting Incorrect information on your report Information belongs to someone else NaN NaN EQUIFAX, INC. TX 76036 NaN Other Web 2023-08-02 Closed with non-monetary relief Yes NaN 7337991 0
5242996 2023-08-02 Payday loan, title loan, or personal loan Installment loan Problem with additional add-on products or ser... NaN NaN NaN NBT BANCORP INC. NJ 08046 NaN NaN Phone 2023-08-02 Closed with explanation Yes NaN 7343156 0
5242997 2023-08-02 Mortgage FHA mortgage Struggling to pay mortgage NaN NaN Company believes it acted appropriately as aut... Chesmar Homes TX 77316 NaN Other Web 2023-08-17 Closed with explanation Yes NaN 7340709 15
5242998 2023-08-02 Credit reporting, credit repair services, or o... Credit reporting Improper use of your report Reporting company used your report improperly NaN NaN EQUIFAX, INC. GA 30013 NaN Consent not provided Web 2023-08-02 Closed with explanation Yes NaN 7341166 0
5242999 2023-08-04 Payday loan, title loan, or personal loan Payday loan Charged fees or interest you didn't expect NaN NaN NaN Flandreau Santee Sioux Tribe TX 75057 NaN Consent not provided Web 2023-08-18 Closed with explanation Yes NaN 7350142 14

5243000 rows × 19 columns

In [28]:
df_9 = pd.DataFrame(comp_df['days taken to forward complaint'].value_counts(dropna=False) )
df_9 = df_9.reset_index()
df_9['days taken to forward complaint'], df_9['count'] = df_9['index'], df_9['days taken to forward complaint']
df_9.drop(['index'], 1, inplace=True)
df_9
Out[28]:
days taken to forward complaint count
0 0 4520949
1 1 180347
2 2 85372
3 3 68321
4 4 62247
... ... ...
526 607 1
527 909 1
528 621 1
529 855 1
530 1553 1

531 rows × 2 columns

The above table shows time taken to forward the complaints to the company.¶

Insight :¶

Around 86.23% of the complaints were forwarded in the same day.¶

Whereas, the worst case complaint took 1553 days to forward.¶

1.15 Visualization for Number of complaints for desired 'Company response to consumer' : ¶

In [29]:
df_10 = pd.DataFrame(comp_df['Company response to consumer'].value_counts(dropna=False) )
df_10 = df_10.reset_index()
df_10['Company response to consumer'], df_10['count'] = df_10['index'], df_10['Company response to consumer']
df_10.drop(['index'], 1, inplace=True)
df_10
Out[29]:
Company response to consumer count
0 Closed with explanation 3473372
1 Closed with non-monetary relief 1330021
2 In progress 233378
3 Closed with monetary relief 154043
4 Closed without relief 17868
5 Closed 17611
6 Untimely response 11388
7 Closed with relief 5304
8 NaN 15
In [30]:
fig = plx.pie(df_10, values='count', names='Company response to consumer')
fig.update_layout(
        title="Total number of complaints for the \"Company response to consumer\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        font=dict(
            family="Courier New, monospace",
            size=15,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_10['Company response to consumer'], textinfo = 'percent+text')
fig.show()

Insight :¶

  • Around 2/3 (66.2%) of the complaints were closed with explanation.

  • 25.4% of the complaints were closed with non-monetary relief.

  • 4.45% of the complaints were in progress.

  • 2.94% of the complaints were closed with monetary relief.

  • 0.341% of the complaints were closed without relief.

  • 0.336% of the complaints were closed.

  • 0.217% of the complaints were untimely responded.

  • 0.101% of the complaints were closed with relief.

  • 0.000286% of null values.

1.16 Visualization for Number of complaints for desired 'Timely response?' : ¶

In [31]:
df_11 = pd.DataFrame(comp_df['Timely response?'].value_counts(dropna=False) )
df_11 = df_11.reset_index()
df_11['Timely response?'], df_11['count'] = df_11['index'], df_11['Timely response?']
df_11.drop(['index'], 1, inplace=True)
df_11
Out[31]:
Timely response? count
0 Yes 5184323
1 No 58677
In [32]:
fig = plx.pie(df_11, values='count', names='Timely response?')
fig.update_layout(
        title="Total number of complaints for the \"Timely response?\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        height = 450,
        font=dict(
            family="Courier New, monospace",
            size=15,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_11['Timely response?'], textinfo = 'percent+text')
fig.show()

Insight :¶

  • 98.9% of the complaints were timely responded.

  • 1.12% of the complaints were not responded timely.

1.17 Visualization for Number of complaints for desired 'Consumer disputed?' : ¶

In [33]:
df_12 = pd.DataFrame(comp_df['Consumer disputed?'].value_counts(dropna=False) )
df_12 = df_12.reset_index()
df_12['Consumer disputed?'], df_12['count'] = df_12['index'], df_12['Consumer disputed?']
df_12.drop(['index'], 1, inplace=True)
df_12
Out[33]:
Consumer disputed? count
0 NaN 4474684
1 No 619938
2 Yes 148378
In [34]:
fig = plx.pie(df_12, values='count', names='Consumer disputed?')
fig.update_layout(
        title="Total number of complaints for the \"Consumer disputed?\" " ,
        margin=dict(t=0, b=0, l=0, r=0),
        height = 450,
        font=dict(
            family="Courier New, monospace",
            size=15,
            color="RebeccaPurple"
        )
)
fig.update_traces(text = df_12['Consumer disputed?'], textinfo = 'percent+text')
fig.show()

Insight :¶

  • 85.3% of the data is null in the column 'Consumer disputed?'.

  • 11.8% of the data were disputed consumers.

  • 2.83% of the data were non disputed consumers.

1.18 Visualization of 'State' vs 'Consumer disputed?': ¶

In [35]:
# creating a dataframe holds disputed/non-disputed count for the particular state
df_13 = pd.DataFrame(comp_df['State'].value_counts() )
df_13 = df_13.reset_index()
df_13['State'], df_13['count'] = df_13['index'], df_13['State']
df_13.drop(['index'], 1, inplace=True)
df_13
Out[35]:
State count
0 FL 629941
1 CA 597278
2 TX 560364
3 GA 360174
4 NY 336721
... ... ...
58 AA 59
59 MP 57
60 AS 50
61 MH 33
62 PW 13

63 rows × 2 columns

In [36]:
try:
    with open('pickle files/df_13.pkl', 'rb') as f:
        df_13 = pickle.load(f)  
except:
    consumer_disputed_null, consumer_disputed_yes, consumer_disputed_no = [], [], []
    for x in tqdm(df_13['State']):
        temp_df_null_count = comp_df.where(( (comp_df['Consumer disputed?'].isna()) & (comp_df['State'] == x) ))
        temp_df_null_count.dropna(how='all',inplace=True)
        consumer_disputed_null.append(len(temp_df_null_count))

        temp_df_yes_count = comp_df.where(( (comp_df['Consumer disputed?'] == 'Yes') & (comp_df['State'] == x) ))
        temp_df_yes_count.dropna(how='all',inplace=True)
        consumer_disputed_yes.append(len(temp_df_yes_count))

        temp_df_no_count = comp_df.where(( (comp_df['Consumer disputed?'] == 'No') & (comp_df['State'] == x) ))
        temp_df_no_count.dropna(how='all',inplace=True)
        consumer_disputed_no.append(len(temp_df_no_count))
    df_13['consumer disputed null'], df_13['consumer disputed yes'], df_13['consumer disputed no'] = consumer_disputed_null, consumer_disputed_yes, consumer_disputed_no
    df_13['non_null_%'] = list((df_13['consumer disputed yes'] + df_13['consumer disputed no']) / df_13['count'] * 100)
    with open('pickle files/df_13.pkl', 'wb') as f:
        pickle.dump(df_13, f)
    
df_13
Out[36]:
State count consumer disputed null consumer disputed yes consumer disputed no non_null_%
0 FL 629941 556356 13904 59681 11.681253
1 CA 597278 486666 22587 88025 18.519349
2 TX 560364 500965 10681 48718 10.600074
3 GA 360174 323936 6739 29499 10.061248
4 NY 336721 284191 9841 42689 15.600453
... ... ... ... ... ... ...
58 AA 59 46 3 10 22.033898
59 MP 57 26 5 26 54.385965
60 AS 50 25 6 19 50.000000
61 MH 33 3 7 23 90.909091
62 PW 13 0 4 9 100.000000

63 rows × 6 columns

The above table shows total complaint counts for the particular state, consumer disputed/non-disputed count, nan count of 'Consumer disputed?' and non null percentage.¶

The non null percentage column in the above dataframe shows the rate of response of column 'Consumer disputed?' for the particular 'State'.¶

In [37]:
fig = plx.choropleth(df_13, locations='State',locationmode='USA-states', color='non_null_%',
                           color_continuous_scale="Viridis",
                           range_color=(0, max(df_13['non_null_%'])),
                           scope="north america",
                           labels={'Consumer Disputed': 'non null rate %'}
                          )
fig.update_layout(
    title = " The rate of response of column 'Consumer disputed?' for the particular 'State'. " ,
    height = 700,
    margin = dict(t=0, b=0, l=0, r=0),
    font = dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    ) 
)
fig.show()

NOTE : The states which has military base (AP, AA, etc.,) are not shown here due to privacy issue.¶

The above map shows the rate of response() of column 'Consumer disputed?' for the particular 'State'.¶

1.19 Visualization of 'Submitted via' vs 'Consumer disputed?': ¶

In [38]:
try:
    with open('pickle files/df_14.pkl', 'rb') as f:
        df_14 = pickle.load(f)
except:
    df_14 = pd.DataFrame(comp_df['Submitted via'].value_counts(dropna=False) )
    df_14= df_14.reset_index()
    df_14['Submitted via'], df_14['count'] = df_14['index'], df_14['Submitted via']
    df_14.drop(['index'], 1, inplace=True)

    consumer_disputed_null, consumer_disputed_yes, consumer_disputed_no = [], [], []
    for x in tqdm(df_14['Submitted via']):
        temp_df_null_count = comp_df.where(( (comp_df['Consumer disputed?'].isna()) & (comp_df['Submitted via'] == x) ))
        temp_df_null_count.dropna(how='all',inplace=True)
        consumer_disputed_null.append(len(temp_df_null_count))

        temp_df_yes_count = comp_df.where(( (comp_df['Consumer disputed?'] == 'Yes') & (comp_df['Submitted via'] == x) ))
        temp_df_yes_count.dropna(how='all',inplace=True)
        consumer_disputed_yes.append(len(temp_df_yes_count))

        temp_df_no_count = comp_df.where(( (comp_df['Consumer disputed?'] == 'No') & (comp_df['Submitted via'] == x) ))
        temp_df_no_count.dropna(how='all',inplace=True)
        consumer_disputed_no.append(len(temp_df_no_count))
    df_14['consumer disputed null'], df_14['consumer disputed yes'], df_14['consumer disputed no'] = consumer_disputed_null, consumer_disputed_yes, consumer_disputed_no
    df_14['non_null_%'] = list((df_14['consumer disputed yes'] + df_14['consumer disputed no']) / df_14['count'] * 100)
    with open('pickle files/df_14.pkl', 'wb') as f:
        pickle.dump(df_14, f)
    
df_14
Out[38]:
Submitted via count consumer disputed null consumer disputed yes consumer disputed no non_null_%
0 Web 4690912 4167966 110151 412795 11.148067
1 Referral 248657 115416 21569 111672 53.584255
2 Phone 181242 129056 8079 44107 28.793547
3 Postal mail 94865 46193 6411 42261 51.306594
4 Fax 25658 14735 2096 8827 42.571518
5 Web Referral 1241 1241 0 0 0.000000
6 Email 425 77 72 276 81.882353
In [39]:
plx.bar(df_14, x = df_14['Submitted via'], y = df_14['non_null_%'])

Insight :¶

  • 81.88% response (consumer disputed?) for the complaints submitted via 'Email'

  • 53.58% response (consumer disputed?) for the complaints submitted via 'Referral'

  • 51.30% response (consumer disputed?) for the complaints submitted via 'Postal mail'

  • 42.57% response (consumer disputed?) for the complaints submitted via 'Fax'

  • 28.79% response (consumer disputed?) for the complaints submitted via 'Phone'

  • 11.14% response (consumer disputed?) for the complaints submitted via 'Web'

  • 0% response (consumer disputed?) for the complaints submitted via 'Web Referral'

Conclusion:¶

In [40]:
comp_df.head(3)
Out[40]:
Date received Product Sub-product Issue Sub-issue Consumer complaint narrative Company public response Company State ZIP code Tags Consumer consent provided? Submitted via Date sent to company Company response to consumer Timely response? Consumer disputed? Complaint ID days taken to forward complaint
0 2024-03-13 Credit card General-purpose credit card or charge card Advertising and marketing, including promotion... Didn't receive advertised or promotional terms I formally request the removal of an inaccurat... NaN EQUIFAX, INC. MO 631XX NaN Consent provided Web 2024-03-13 Closed with non-monetary relief Yes NaN 8538710 0
1 2024-03-14 Credit reporting or other personal consumer re... Credit reporting Incorrect information on your report Information belongs to someone else NaN NaN EQUIFAX, INC. PA 19142 NaN Consent not provided Web 2024-03-14 Closed with non-monetary relief Yes NaN 8551289 0
2 2023-10-25 Credit reporting or other personal consumer re... Credit reporting Improper use of your report Reporting company used your report improperly NaN Company has responded to the consumer and the ... TRANSUNION INTERMEDIATE HOLDINGS, INC. CA 92114 NaN Consent not provided Web 2023-10-25 Closed with explanation Yes NaN 7755832 0
In [41]:
conclusion_df = comp_df.where( (comp_df['Consumer disputed?'] == 'Yes'))
conclusion_df.dropna(how='all', inplace=True)
In [42]:
conclusion_df
Out[42]:
Date received Product Sub-product Issue Sub-issue Consumer complaint narrative Company public response Company State ZIP code Tags Consumer consent provided? Submitted via Date sent to company Company response to consumer Timely response? Consumer disputed? Complaint ID days taken to forward complaint
4653 2015-06-12 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN Dear Sir or Madam : In Re : loan # XXXX reason... NaN NATIONSTAR MORTGAGE LLC LA 70058 Older American Consent provided Web 2015-06-15 Closed with explanation Yes Yes 1419666.0 3.0
6548 2012-09-27 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN NaN NaN Specialized Loan Servicing Holdings LLC NJ 07104 Servicemember NaN Web 2012-09-28 Closed Yes Yes 161317.0 1.0
7916 2013-08-08 Credit reporting NaN Incorrect information on credit report Public record NaN NaN EQUIFAX, INC. GA 30024 NaN NaN Web 2013-08-08 Closed with non-monetary relief Yes Yes 482995.0 0.0
8007 2016-10-13 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN My original home loan was with XXXX made in 20... Company believes it acted appropriately as aut... NATIONSTAR MORTGAGE LLC TX 77373 NaN Consent provided Web 2016-10-13 Closed with explanation Yes Yes 2159897.0 0.0
8022 2016-11-21 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN My mortgage is currently held by Nationstar Mo... NaN NATIONSTAR MORTGAGE LLC MO 64118 NaN Consent provided Web 2016-11-21 Closed with explanation Yes Yes 2217145.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5242296 2014-04-21 Credit card NaN Rewards NaN NaN NaN BANK OF AMERICA, NATIONAL ASSOCIATION NJ 07302 NaN NaN Web 2014-04-21 Closed with explanation Yes Yes 816833.0 0.0
5242303 2012-05-20 Credit card NaN Other NaN NaN NaN CITIBANK, N.A. IL 62454 NaN NaN Referral 2012-06-20 Closed with explanation Yes Yes 85101.0 31.0
5242311 2014-07-22 Credit card NaN Billing statement NaN NaN NaN CAPITAL ONE FINANCIAL CORPORATION CA 92618 NaN NaN Web 2014-07-22 Closed with explanation Yes Yes 947372.0 0.0
5242312 2012-07-20 Student loan Non-federal student loan Repaying your loan NaN NaN NaN CITIBANK, N.A. FL 33415 NaN NaN Phone 2012-07-23 Closed with explanation Yes Yes 120914.0 3.0
5242314 2012-03-11 Credit card NaN Late fee NaN NaN NaN CITIBANK, N.A. NY 10001 NaN NaN Referral 2012-05-26 Closed with relief Yes Yes 33570.0 76.0

148378 rows × 19 columns

In [43]:
conclusion_df.columns
Out[43]:
Index(['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue',
       'Consumer complaint narrative', 'Company public response', 'Company',
       'State', 'ZIP code', 'Tags', 'Consumer consent provided?',
       'Submitted via', 'Date sent to company', 'Company response to consumer',
       'Timely response?', 'Consumer disputed?', 'Complaint ID',
       'days taken to forward complaint'],
      dtype='object')
In [44]:
conclusion_df.drop(['Date received', 'Consumer complaint narrative', 'Company public response', 'Complaint ID', 'ZIP code', 'Tags','Company response to consumer',
       'days taken to forward complaint', 'Consumer consent provided?','Timely response?', 'Submitted via', 'Date sent to company'], axis=1, inplace=True)
conclusion_df.reset_index(inplace=True,drop=True)
conclusion_df
Out[44]:
Product Sub-product Issue Sub-issue Company State Consumer disputed?
0 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN NATIONSTAR MORTGAGE LLC LA Yes
1 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN Specialized Loan Servicing Holdings LLC NJ Yes
2 Credit reporting NaN Incorrect information on credit report Public record EQUIFAX, INC. GA Yes
3 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN NATIONSTAR MORTGAGE LLC TX Yes
4 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN NATIONSTAR MORTGAGE LLC MO Yes
... ... ... ... ... ... ... ...
148373 Credit card NaN Rewards NaN BANK OF AMERICA, NATIONAL ASSOCIATION NJ Yes
148374 Credit card NaN Other NaN CITIBANK, N.A. IL Yes
148375 Credit card NaN Billing statement NaN CAPITAL ONE FINANCIAL CORPORATION CA Yes
148376 Student loan Non-federal student loan Repaying your loan NaN CITIBANK, N.A. FL Yes
148377 Credit card NaN Late fee NaN CITIBANK, N.A. NY Yes

148378 rows × 7 columns

In [45]:
conclusion_df.drop(['Consumer disputed?'],axis=1,inplace=True)
conclusion_df['Consumer disputed?'] = [1] * len(conclusion_df)
conclusion_df['all'] = ['all'] * len(conclusion_df)
conclusion_df = conclusion_df.astype({"State": str})
conclusion_df
Out[45]:
Product Sub-product Issue Sub-issue Company State Consumer disputed? all
0 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN NATIONSTAR MORTGAGE LLC LA 1 all
1 Mortgage Conventional fixed mortgage Loan modification,collection,foreclosure NaN Specialized Loan Servicing Holdings LLC NJ 1 all
2 Credit reporting NaN Incorrect information on credit report Public record EQUIFAX, INC. GA 1 all
3 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN NATIONSTAR MORTGAGE LLC TX 1 all
4 Mortgage FHA mortgage Loan servicing, payments, escrow account NaN NATIONSTAR MORTGAGE LLC MO 1 all
... ... ... ... ... ... ... ... ...
148373 Credit card NaN Rewards NaN BANK OF AMERICA, NATIONAL ASSOCIATION NJ 1 all
148374 Credit card NaN Other NaN CITIBANK, N.A. IL 1 all
148375 Credit card NaN Billing statement NaN CAPITAL ONE FINANCIAL CORPORATION CA 1 all
148376 Student loan Non-federal student loan Repaying your loan NaN CITIBANK, N.A. FL 1 all
148377 Credit card NaN Late fee NaN CITIBANK, N.A. NY 1 all

148378 rows × 8 columns

1. The Impact of 'Product' category on 'consumer disputes'¶

In [46]:
plx.treemap(conclusion_df, path=['all','Product'], values='Consumer disputed?')

2. The Impact of 'Issue' category on 'consumer disputes'¶

In [47]:
plx.treemap(conclusion_df, path=['all','Issue'], values='Consumer disputed?')

3. The Impact of 'Company' on 'consumer disputes'¶

In [48]:
plx.treemap(conclusion_df, path=['all','Company'], values='Consumer disputed?')

4. 'consumer disputes' on 'State' level¶

In [49]:
conclusion_df.columns
Out[49]:
Index(['Product', 'Sub-product', 'Issue', 'Sub-issue', 'Company', 'State',
       'Consumer disputed?', 'all'],
      dtype='object')
In [50]:
plx.treemap(conclusion_df, path=['all','State'], values='Consumer disputed?')

5. The Impact of 'Issue' category on 'Product' of 'Consumer disputes'¶

In [51]:
plx.treemap(conclusion_df, path=['all','Product','Issue'], values='Consumer disputed?')

6. Everything in a Single Visualization¶

In [52]:
plx.treemap(conclusion_df, path=['all','Company','Product','Issue'], values='Consumer disputed?')

The above treemap show that number of disputes on each company, product and Issue. For example, if I'm clicking 1st company 'Bank of America...',it will be zoomed and shows which product category and issue category has huge impact for consumer disputes.¶

2)¶

In [53]:
# longest continuous increasing subsequence (subarray).
"""
--> The variable 'i' in the 1st 'for' loop returns the starting index of the required subarray.
--> The variable 'j' in the 2nd 'for' loop returns the ending index of the required subarray.
--> For example, given array = [1,3,5,4,7]. 
    Possible sub array (1st) = [1,3,5,4,7], 2nd = [3,5,4,7], 3rd = [5,4,7], 4th= [4,7].
--> The approach to find continuous increasing subsequence from the sub array are, 
    For instance we can consider 1st sub array[1,3,5,4,7]. We have to check one by one, like 
    (1,3), (3,5), (5,4), (4,7). We have to break once 1st value is greater than 2nd value in the pair.
    Here the increasing pair count is 2 for this paricular sub array. 
--> The same way we have to check for possible subarray in a list. At each iteration the count is updated. 
--> The longest continuous possible pair is (1,3) and (3,5). The subarray should be [1,3,5], 
    The longth of longest continuous increasing subsequence is possible pair count + 1, which is 2(pairs) + 1 = 3 
--> If the array is in descending order or if an array contains same values ([2,2,2,2,2]). Then, the longest 
    continuous increasing subsequence should be 1, which is [2].
    
"""
arr = [1,3,5,4,7]
#arr = [2,2,2,2,2]
long_incr_subseq = 0
for i in range(0, len(arr)-1):
    count = 0
    new_i = i
    for j in range(i+1, len(arr)):
        curr_val, next_val= arr[new_i], arr[j]
        
        if curr_val < next_val:
            count += 1
        else:
            break
        new_i += 1
    if count > long_incr_subseq:
        long_incr_subseq = count

if long_incr_subseq > 0:
    long_incr_subseq += 1
    print(long_incr_subseq)       
elif long_incr_subseq == 0:
    print(long_incr_subseq + 1)
3

3)¶

In [54]:
# Largest possible number from the list
"""
--> permutations gives us the possible combinations. 
--> Converting from string to interger, and then maximum of the array gives us the largest possible number.
"""
from itertools import permutations
arr_3 = [3,30,34,5,9]
possible_out = list(permutations(arr_3, len(arr_3)))
possible_out_int = []
for x in possible_out:
    x = ''.join([str(q) for q in list(x)])
    possible_out_int.append(int(x))
print(max(possible_out_int))
9534330

4)¶

In [55]:
# Store all the "servlet-name", and "servlet-class" in a csv file
"""
--> The json file is saved in the current location of this notebook.
--> 'json' package is used to import and read the required JSON file.
--> "file['web-app']['servlet']" is used to access servlet key of the file(json) which holds all the servlet name and class.
--> The accessed servlet names and classes were stored in the list.
--> 'pandas' library is used to convert the list into dataframe. Then, dataframe is converted into csv file and stored
    in a current location.
--> 
"""
import json
import pandas as pd

with open('DT A1 sample_json (1) (1) (3).json', 'r') as f:
    file = json.load(f)
servlet_name, servlet_class = [], []
for x in file['web-app']['servlet']:
    servlet_name.append(x['servlet-name'])
    servlet_class.append(x['servlet-class'])
df = pd.DataFrame()
df['servlet-name'], df['servlet-class'] = servlet_name, servlet_class
df.to_csv('servlet_name_class.csv',index=False)
df
Out[55]:
servlet-name servlet-class
0 cofaxCDS org.cofax.cds.CDSServlet
1 cofaxEmail org.cofax.cds.EmailServlet
2 cofaxAdmin org.cofax.cds.AdminServlet
3 fileServlet org.cofax.cds.FileServlet
4 cofaxTools org.cofax.cms.CofaxToolsServlet